AITopics | fine-grained visual prompting

Supplementary Materials for " Fine-Grained Visual Prompting " Lingfeng Y ang 1, Y ueze Wang

Neural Information Processing SystemsFeb-11-2026, 16:03:20 GMT

By applying a single blur operation, we can retain more spatial relevance information. Moreover, since the images are blurred, they may have a relatively minor impact on the recognition ability of CLIP on the target.

ablation study, artificial intelligence, natural language, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
Asia > China > Beijing > Beijing (0.05)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.72)

Add feedback

4e9fa6e716940a7cfc60c46e6f702f52-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 16:03:17 GMT

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > California (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fine-Grained Visual Prompting

Neural Information Processing SystemsDec-25-2025, 04:12:45 GMT

Vision-Language Models (VLMs), such as CLIP, have demonstrated impressive zero-shot transfer capabilities in image-level visual perception. However, these models have shown limited performance in instance-level tasks that demand precise localization and recognition. Previous works have suggested that incorporating visual prompts, such as colorful boxes or circles, can improve the ability of models to recognize objects of interest. Nonetheless, compared to language prompting, visual prompting designs are rarely explored. Existing approaches, which employ coarse visual cues such as colorful boxes or circles, often result in sub-optimal performance due to the inclusion of irrelevant and noisy pixels. In this paper, we carefully study the visual prompting designs by exploring more fine-grained markings, such as segmentation masks and their variations. In addition, we introduce a new zero-shot framework that leverages pixel-level annotations acquired from a generalist segmentation model for fine-grained visual prompting. Consequently, our investigation reveals that a straightforward application of blur outside the target mask, referred to as the Blur Reverse Mask, exhibits exceptional effectiveness. This proposed prompting strategy leverages the precise mask annotations to reduce focus on weakly related regions while retaining spatial coherence between the target and the surrounding background.

fine-grained visual prompting, name change, visual prompting design, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.52)

Add feedback

Supplementary Materials for " Fine-Grained Visual Prompting " Lingfeng Y ang 1, Y ueze Wang

Neural Information Processing SystemsOct-8-2025, 16:09:52 GMT

By applying a single blur operation, we can retain more spatial relevance information. Moreover, since the images are blurred, they may have a relatively minor impact on the recognition ability of CLIP on the target.

ablation study, artificial intelligence, natural language, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
Asia > China > Beijing > Beijing (0.05)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.72)

Add feedback

4e9fa6e716940a7cfc60c46e6f702f52-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 16:09:49 GMT

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > California (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fine-Grained Visual Prompting

Neural Information Processing SystemsJan-17-2025, 22:59:36 GMT

Vision-Language Models (VLMs), such as CLIP, have demonstrated impressive zero-shot transfer capabilities in image-level visual perception. However, these models have shown limited performance in instance-level tasks that demand precise localization and recognition. Previous works have suggested that incorporating visual prompts, such as colorful boxes or circles, can improve the ability of models to recognize objects of interest. Nonetheless, compared to language prompting, visual prompting designs are rarely explored. Existing approaches, which employ coarse visual cues such as colorful boxes or circles, often result in sub-optimal performance due to the inclusion of irrelevant and noisy pixels.

annotation, fine-grained visual prompting, visual prompting design, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

Add feedback